Information Extraction from Multiple Syntactic Sources
نویسنده
چکیده
Information Extraction is the automatic extraction of facts from text, which includes detection of named entities, entity relations and events. Conventional approaches to Information Extraction try to find syntactic patterns based on deep processing of text, such as partial or full parsing. The problem these solutions have to face is that as deeper analysis is used, the accuracy of the result decreases, and one cannot recover from the induced errors. On the other hand, lower level processing is more accurate and it can also provide useful information. However, within the framework of conventional approaches, this kind of information can not be efficiently incorporated. This thesis describes a novel supervised approach based on kernel methods to address these issues. In this approach customized kernels are used to match syntactic structures produced from different preprocessing phases. Using properties of a kernel, individual kernels are combined into a composite kernel to integrate and extend all the information. The composite kernels can be used with various classifiers, such as Nearest Neighbor or Support Vector Machines (SVM). The main classifier we propose to use is SVM due to its ability to generalize in large dimensional feature spaces. We will show that each level of syntactic information can contribute to IE tasks, and low level information can help to recover from errors in deep processing.
منابع مشابه
Exploiting Rich Syntactic Information for Relation Extraction from Biomedical Articles∗
This paper proposes a ternary relation extraction method primarily based on rich syntactic information. We identify PROTEIN-ORGANISM-LOCATION relations in the text of biomedical articles. Different kernel functions are used with an SVM learner to integrate two sources of information from syntactic parse trees: (i) a large number of syntactic features that have been shown useful for Semantic Rol...
متن کاملExploiting Rich Syntactic Information for Relationship Extraction from Biomedical Articles
This paper proposes a ternary relation extraction method primarily based on rich syntactic information. We identify PROTEIN-ORGANISM-LOCATION relations in the text of biomedical articles. Different kernel functions are used with an SVM learner to integrate two sources of information from syntactic parse trees: (i) a large number of syntactic features that have been shown useful for Semantic Rol...
متن کاملCombining Multiple Layers of Syntactic Information for Protein-Protein Interaction Extraction
Protein-protein interaction extraction is a challenging information extraction task in the BioNLP field. Several kernels focusing on a part of syntactic information have been proposed for the task. In this paper, we propose a method to combine multiple layers of syntactic information by using a combination of multiple kernels based on several different parsers. We evaluated the method using sup...
متن کاملabstracts
contents The systematic extraction model of the knowledge sources and tools from The Holy Quran/ Ali Mowlaei, Mahdi Golshani Critical Analysis of Epistemological Principles of Cartesian Humanism Based on Allameh Mohammad-Taqi Ja'fari's Thoughts/ Narges Aboul-Qasemian, Abdollah Nasri, Fazlollah Khaleghian <st...
متن کاملSyntactic Structures and Rhetorical Functions of Electrical Engineering, Psychiatry, and Linguistics Research Article Titles in English and Persian: A Cross-linguistic and Cross-disciplinary Study
A research article (RA) title is the first and foremost feature that attracts the reader's attention, the feature from which she/he may decide whether the whole article is worth reading. The present study attempted to investigate syntactic structures and rhetorical functions of RA titles written in English and Persian and published in journals in three disciplines of Electrical Engineering, Psy...
متن کامل